There Is No Preview Available For This Item
This item does not appear to have any files that can be experienced on Archive.org.
Please download files in this item to interact with them on your computer.
Show all files
For the past 15 years microprocessor performance has largely been driven by improvements in clock frequency, which has doubled every two years. However, due to microarchitectural pipelining limits and device limits, this frequency growth will significantly slow down, and in-fact has already slowed down since 2004. Hence, future processors must exploit concurrency to provide performance improvements.
In this talk, I will describe novel architecture and microarchitecture mechanisms to exploit concurrency developed as part of the TRIPS project at UT-Austin. First, I describe EDGE ISAs - a new class of ISAs (Instruction Set Architecture) that efficiently express concurrency to the hardware through dataflow graphs. Second, I describe the TRIPS processor which employs a distributed microarchitecture to implement an EDGE ISA. I will explain the key design principles and insights in both the ISA and the microarchitecture. I will focus on three main aspects: (1) amortizing the overheads of bookkeeping for each instruction, (2) expressing dependences efficiently to the hardware, and (3) eliminating centralized resources.
Using this approach, we support both programs with irregular concurrency and regular concurrency. We use predication and other compiler heuristics to build large blocks of instructions and express the dependences within this block explicitly in the ISA. To support irregular concurrency, we use control speculation in the hardware and exploit locality to mine concurrency using the hardware. For regular concurrency, where the compiler can identify the parallelism, we studied the fundamental properties of these programs. We discovered many similarities and some surprising differences in the memory behavior and control flow of these programs. Based on this study, we develop a small set of spanning microarchitecture mechanisms to more efficiently execute these programs. In this talk, I will also briefly describe the implementation of the prototype TRIPS chip we have built at UT-Austin.